Skip to content

SciX ID included in output#96

Open
ehenneken wants to merge 3 commits intoadsabs:masterfrom
ehenneken:SciXID
Open

SciX ID included in output#96
ehenneken wants to merge 3 commits intoadsabs:masterfrom
ehenneken:SciXID

Conversation

@ehenneken
Copy link
Copy Markdown
Member

This PR represents the work done to include the SciX ID in the output of the service. In order to achieve this goal, the following was done

  • Update config.py to include the Solr field scix_id in REFERENCE_SERVICE_QUERY_FIELDS_SOLR so that this field is always present when results (i.e. potential matches) are retrieved from Solr
  • Whether the /text or /xml endpoint is used to submit reference data, the actual processing always ends up in the function solve_for_fields. Here the contents of the Solr field scix_id need to be passed on when creating the Solution instance sent back
  • The Solution class needs to be augmented to support the inclusion of the SciX ID through the scix_id attribute. The string representation of class instances __str__ needs to be updated to include the value of this attribute. This is because the result of the matching process is passed on as str(solve_reference(Hypotheses(parsed_ref))) (the resolve_reference returns a Solution instance).
  • The response to be sent back is generated in the format_resolved_reference function. This function parses the text string generated by stringifying the Solution instance and creating a JSON structure with the results of the matching

@ehenneken ehenneken requested a review from Thomas-S-Allen April 1, 2026 12:55
Copy link
Copy Markdown

@Thomas-S-Allen Thomas-S-Allen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 295 in solve.py if scix_id or bibcode are missing this line could break. Could use sol.get("scix_id", None) and sol.get("bibcode", None) instead. Same for the logging code on line 291.

This brings up another question, it seems that the Solution class, lines 257-290 in common.py, requires cited_bibcode. Should it be set to cited_bibcode=None in the initialization on line 267. Similar should there be a cited_scix_id and citing_scix_id.

Also with the solution class, do you always want both bibcode and scix_id to be part of the return string, or make it conditional in case one is not available. Making it conditional would require changing 126-129 in views.py to no longer parse based on position. Could use a regex instead to match for bibcode:... and scixid:... separately, or possibly another method that is not position based.

Also, if the return string, line 287 in common.py, has scix_id instead of scixid it would be more consistent with how scix_id is used throughout the code. Or would that cause more difficulty downstream in views.py, lines 126-129.

@ehenneken
Copy link
Copy Markdown
Member Author

Line 295 in solve.py if scix_id or bibcode are missing this line could break. Could use sol.get("scix_id", None) and sol.get("bibcode", None) instead. Same for the logging code on line 291.

Technically Solr records should always have a bibcode or scix_id field, definitely always a scix_id. But, there is no harm to implement the changes you suggest.

This brings up another question, it seems that the Solution class, lines 257-290 in common.py, requires cited_bibcode. Should it be set to cited_bibcode=None in the initialization on line 267. Similar should there be a cited_scix_id and citing_scix_id.

I don't understand why citing_bibcode was implemented in the first place. There is no functionality using it. I'm just leaving it in for now. So, there is no need to introduce a citing_scix_id. I prefer to keep the code as is, i.e. with just scix_id.

Also with the solution class, do you always want both bibcode and scix_id to be part of the return string, or make it conditional in case one is not available. Making it conditional would require changing 126-129 in views.py to no longer parse based on position. Could use a regex instead to match for bibcode:... and scixid:... separately, or possibly another method that is not position based.

Yes, my preference is to keep both bibcode and scix_id as part of the returned results.

Also, if the return string, line 287 in common.py, has scix_id instead of scixid it would be more consistent with how scix_id is used throughout the code. Or would that cause more difficulty downstream in views.py, lines 126-129.

I decided to use scixid to distinguish it from the Solr field scix_id. Keeping both scixid and scix_id provides a means to distinguish between two different uses: scix_id as variable holds the value of the Solr field with the same name. The string scixid makes it immediately clear that this is not a variable, but the string in the output (which should be distinguishable from the value it carries, which starts with scid_id:.

@ehenneken
Copy link
Copy Markdown
Member Author

For the additional commit to this PR:

I'm leaving the Solution instantiation the way it was originally implemented with the addition of sci_id as a named variable. The cited_bibcode is always defined (potentially with value None).

The biggest additions in this latest commit are the cases where Solution instances are crated when doubtful matches are considered. It is in the cases where the Undecidable exception is raised (like here) where the SciX ID has to be included in the list considered_solutions.

Things get a little tricky here where there are explicit assumptions on matches that just have bibcodes and scores. The current solution is a work-around that seems adequate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants